Search results for "Decision tree learning"
showing 10 items of 13 documents
A Simple Method to Predict Blood-Brain Barrier Permeability of Drug- Like Compounds Using Classification Trees
2017
Background: To know the ability of a compound to penetrate the blood-brain barrier (BBB) is a challenging task; despite the numerous efforts realized to predict/measure BBB passage, they still have several drawbacks. Methods: The prediction of the permeability through the BBB is carried out using classification trees. A large data set of 497 compounds (recently published) is selected to develop the tree model. Results: The best model shows an accuracy higher than 87.6% for training set; the model was also validated using 10-fold cross-validation procedure and through a test set achieving accuracy values of 86.1% and 87.9%, correspondingly. We give a brief explanation, in structural terms, o…
Prediction of Chromatin Accessibility in Gene-Regulatory Regions from Transcriptomics Data
2017
AbstractThe epigenetics landscape of cells plays a key role in the establishment of cell-type specific gene expression programs characteristic of different cellular phenotypes. Different experimental procedures have been developed to obtain insights into the accessible chromatin landscape including DNase-seq, FAIRE-seq and ATAC-seq. However, current downstream computational tools fail to reliably determine regulatory region accessibility from the analysis of these experimental data. In particular, currently available peak calling algorithms are very sensitive to their parameter settings and show highly heterogeneous results, which hampers a trustworthy identification of accessible chromatin…
Improving the Competency of Classifiers through Data Generation
2001
This paper describes a hybrid approach in which sub-symbolic neural networks and symbolic machine learning algorithms are grouped into an ensemble of classifiers. Initially each classifier determines which portion of the data it is most competent in. The competency information is used to generated new data that are used for further training and prediction. The application of this approach in a difficult to learn domain shows an increase in the predictive power, in terms of the accuracy and level of competency of both the ensemble and the component classifiers.
Computational identification of chemical compounds with potential anti-Chagas activity using a classification tree
2021
Chagas disease is endemic to 21 Latin American countries and is a great public health problem in that region. Current chemotherapy remains unsatisfactory; consequently the need to search for new drugs persists. Here we present a new approach to identify novel compounds with potential anti-chagasic action. A large dataset of 584 compounds, obtained from the Drugs for Neglected Diseases initiative, was selected to develop the computational model. Dragon software was used to calculate the molecular descriptors and WEKA software to obtain the classification tree. The best model shows accuracy greater than 93.4% for the training set; the tree was also validated using a 10-fold cross-validation p…
The predictive power of game-related statistics for the final result under the rule changes introduced in the men’s world water polo championship: a …
2019
The objectives of this study were (i) to compare water polo game-related statistics by match outcome (winning and losing teams) after the application of the new rules, and (ii) to develop a classif...
Land cover classification of VHR airborne images for citrus grove identification
2011
Abstract Managing land resources using remote sensing techniques is becoming a common practice. However, data analysis procedures should satisfy the high accuracy levels demanded by users (public or private companies and governments) in order to be extensively used. This paper presents a multi-stage classification scheme to update the citrus Geographical Information System (GIS) of the Comunidad Valenciana region (Spain). Spain is the first citrus fruit producer in Europe and the fourth in the world. In particular, citrus fruits represent 67% of the agricultural production in this region, with a total production of 4.24 million tons (campaign 2006–2007). The citrus GIS inventory, created in…
Multifactorial combinations predicting active vs inactive stages of change for physical activity in adolescents considering built environment and psy…
2018
Estimating feature discriminant power in decision tree classifiers
1995
Feature Selection is an important phase in pattern recognition system design. Even though there are well established algorithms that are generally applicable, the requirement of using certain type of criteria for some practical problems makes most of the resulting methods highly inefficient. In this work, a method is proposed to rank a given set of features in the particular case of Decision Tree classifiers, using the same information generated while constructing the tree. The preliminary results obtained with both synthetic and real data confirm that the performance is comparable to that of sequential methods with much less computation.
Deterministic Linkage as a Preceding Filter for Other Record Linkage Methods
2015
Deterministic record linkage (RL) is frequently regarded as a rival to more sophisticated strategies like probabilistic RL. We investigate the effect of combining deterministic linkage with other linkage techniques. For this task, we use a simple deterministic linkage strategy as a preceding filter: a data pair is classified as ‘match' if all values of attributes considered agree exactly, otherwise as ‘nonmatch'. This strategy is separately combined with two probabilistic RL methods based on the Fellegi–Sunter model and with two classification tree methods (CART and Bagging). An empirical comparison was conducted on two real data sets. We used four different partitions into training data a…
Comparing Boosting and Bagging for Decision Trees of Rankings
2021
AbstractDecision tree learning is among the most popular and most traditional families of machine learning algorithms. While these techniques excel in being quite intuitive and interpretable, they also suffer from instability: small perturbations in the training data may result in big changes in the predictions. The so-called ensemble methods combine the output of multiple trees, which makes the decision more reliable and stable. They have been primarily applied to numeric prediction problems and to classification tasks. In the last years, some attempts to extend the ensemble methods to ordinal data can be found in the literature, but no concrete methodology has been provided for preference…